As you learned in the previous lessons, YOLO is a state-of-the-art, real-time object detection algorithm. In this notebook, we will apply the YOLO algorithm to detect objects in images. We have provided a series of images that you can test the YOLO algorithm on. Below is a list of the available images that you can load:
These images are located in the./images/folder. We encourage you to test the YOLO algorithm on your own images as well. Have fun!
We will start by loading the required packages into Python. We will be using OpenCV to load our images, matplotlib to plot them, autils module that contains some helper functions, and a modified version of Darknet. YOLO uses Darknet, an open source, deep neural network framework written by the creators of YOLO. The version of Darknet used in this notebook has been modified to work in PyTorch 0.4 and has been simplified because we won't be doing any training. Instead, we will be using a set of pre-trained weights that were trained on the Common Objects in Context (COCO) database. For more information on Darknet, please visit Darknet.
import cv2
import matplotlib.pyplot as plt
from utils import *
from darknet import Darknet
We will be using the latest version of YOLO, known as YOLOv3. We have already downloaded the yolov3.cfg file that contains the network architecture used by YOLOv3 and placed it in the /cfg/ folder. Similarly, we have placed the yolov3.weights file that contains the pre-trained weights in the /weights/ directory. Finally, the /data/ directory, contains the coco.names file that has the list of the 80 object classes that the weights were trained to detect.
In the code below, we start by specifying the location of the files that contain the neural network architecture, the pre-trained weights, and the object classes. We then use Darknet to setup the neural network using the network architecture specified in the cfg_file. We then use the.load_weights() method to load our set of pre-trained weights into the model. Finally, we use the load_class_names() function, from the utils module, to load the 80 object classes.
# Set the location and name of the cfg file
cfg_file = './cfg/yolov3.cfg'
# Set the location and name of the pre-trained weights file
weight_file = './weights/yolov3.weights'
# Set the location and name of the COCO object classes file
namesfile = 'data/coco.names'
# Load the network architecture
m = Darknet(cfg_file)
# Load the pre-trained weights
m.load_weights(weight_file)
# Load the COCO object classes
class_names = load_class_names(namesfile)
Now that the neural network has been setup, we can see what it looks like. We can print the network using the .print_network() function.
# Print the neural network used in YOLOv3
m.print_network()
As we can see, the neural network used by YOLOv3 consists mainly of convolutional layers, with some shortcut connections and upsample layers. For a full description of this network please refer to the YOLOv3 Paper.
In the code below, we load our images using OpenCV's cv2.imread() function. Since, this function loads images as BGR we will convert our images to RGB so we can display them with the correct colors.
As we can see in the previous cell, the input size of the first layer of the network is 416 x 416 x 3. Since images have different sizes, we have to resize our images to be compatible with the input size of the first layer in the network. In the code below, we resize our images using OpenCV's cv2.resize() function. We then plot the original and resized images.
# Set the default figure size
plt.rcParams['figure.figsize'] = [24.0, 14.0]
# Load the image
img = cv2.imread('./images/surf.jpg')
# Convert the image to RGB
original_image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# We resize the image to the input width and height of the first layer of the network.
resized_image = cv2.resize(original_image, (m.width, m.height))
# Display the images
plt.subplot(121)
plt.title('Original Image')
plt.imshow(original_image)
plt.subplot(122)
plt.title('Resized Image')
plt.imshow(resized_image)
plt.show()
As you learned in the previous lessons, YOLO uses Non-Maximal Suppression (NMS) to only keep the best bounding box. The first step in NMS is to remove all the predicted bounding boxes that have a detection probability that is less than a given NMS threshold. In the code below, we set this NMS threshold to 0.6. This means that all predicted bounding boxes that have a detection probability less than 0.6 will be removed.
# Set the NMS threshold
nms_thresh = 0.6
After removing all the predicted bounding boxes that have a low detection probability, the second step in NMS, is to select the bounding boxes with the highest detection probability and eliminate all the bounding boxes whose Intersection Over Union (IOU) value is higher than a given IOU threshold. In the code below, we set this IOU threshold to 0.4. This means that all predicted bounding boxes that have an IOU value greater than 0.4 with respect to the best bounding boxes will be removed.
In the utils module you will find the nms function, that performs the second step of Non-Maximal Suppression, and the boxes_iou function that calculates the Intersection over Union of two given bounding boxes. You are encouraged to look at these functions to see how they work.
# Set the IOU threshold
iou_thresh = 0.4
Once the image has been loaded and resized, and you have chosen your parameters for nms_thresh and iou_thresh, we can use the YOLO algorithm to detect objects in the image. We detect the objects using the detect_objects(m, resized_image, iou_thresh, nms_thresh)function from the utils module. This function takes in the model m returned by Darknet, the resized image, and the NMS and IOU thresholds, and returns the bounding boxes of the objects found.
Each bounding box contains 7 parameters: the coordinates (x, y) of the center of the bounding box, the width w and height h of the bounding box, the confidence detection level, the object class probability, and the object class id. The detect_objects() function also prints out the time it took for the YOLO algorithm to detect the objects in the image and the number of objects detected. Since we are running the algorithm on a CPU it takes about 2 seconds to detect the objects in an image, however, if we were to use a GPU it would run much faster.
Once we have the bounding boxes of the objects found by YOLO, we can print the class of the objects found and their corresponding object class probability. To do this we use the print_objects() function in the utils module.
Finally, we use the plot_boxes() function to plot the bounding boxes and corresponding object class labels found by YOLO in our image. If you set the plot_labels flag to False you will display the bounding boxes with no labels. This makes it easier to view the bounding boxes if your nms_thresh is too low. The plot_boxes()function uses the same color to plot the bounding boxes of the same object class. However, if you want all bounding boxes to be the same color, you can use the color keyword to set the desired color. For example, if you want all the bounding boxes to be red you can use:
plot_boxes(original_image, boxes, class_names, plot_labels = True, color = (1,0,0))
You are encouraged to change the iou_thresh and nms_thresh parameters to see how they affect the YOLO detection algorithm. The default values of iou_thresh = 0.4 and nms_thresh = 0.6 work well to detect objects in different kinds of images. In the cell below, we have repeated some of the code used before in order to prevent you from scrolling up down when you want to change the iou_thresh and nms_threshparameters or the image. Have Fun!
# Set the default figure size
plt.rcParams['figure.figsize'] = [24.0, 14.0]
# Load the image
img1 = cv2.imread('./images/market3.png')
img2 = cv2.imread('./images/city_scene.jpg')
# Convert the image to RGB
original_image1 = cv2.cvtColor(img1, cv2.COLOR_BGR2RGB)
original_image2 = cv2.cvtColor(img2, cv2.COLOR_BGR2RGB)
# We resize the image to the input width and height of the first layer of the network.
resized_image1 = cv2.resize(original_image1, (m.width, m.height))
resized_image2 = cv2.resize(original_image2, (m.width, m.height))
# Set the IOU threshold. Default value is 0.4
iou_thresh = 0.5
# Set the NMS threshold. Default value is 0.6
nms_thresh = 0.6
# Detect objects in the image
boxes1 = detect_objects(m, resized_image1, iou_thresh, nms_thresh)
boxes2 = detect_objects(m, resized_image2, iou_thresh, nms_thresh)
# Print the objects found and the confidence level
print_objects(boxes1, class_names)
print("---------------------------------------------")
print_objects(boxes2, class_names)
#Plot the image with bounding boxes and corresponding object class labels
plot_boxes(original_image1, boxes1, class_names, plot_labels = True)
plot_boxes(original_image2, boxes2, class_names, plot_labels = True)